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Abstract. We observe n inhomogeneous Poisson processes with covariates and 
aim at estimating their intensities. To handle this problem, we assume that the 
intensity of each Poisson process is of the form s{-,x) where x is the covariatc and 
where s is an unknown function. We propose a model selection approach where 
the models are used to approximate the multivariate function s. We show that our 
estimator satisfies an oracle-type inequality under very weak assumptions both 
on the intensities and the models. By using an HcUingcr-typc loss, we establish 
non-asymptotic risk bounds and specify them under various kind of assumptions 
on the target function s such as being smooth or composite. Besides, we show 
that our estimation procedure is robust with respect to these assumptions. 



We consider n independent Poisson point processes iVj for i = 1, . . . ,n with values 
in the measurable space (T, 3^). For each i, we assume that the intensity of Ni with 
respect to some reference measure fj, on (T, ^) is of the form Sj(-) = s{-,Xi) where 
Xi is a deterministic element of some measurable set (X, ^) and s is a non-negative 
function on T x X satisfying 



Typically, this corresponds to the modelling of the times of failure of n repairable 
systems where the reliability of each of them depends on external factors measured 
by some covariates xi, . . . , x„, in which case T corresponds to an interval of time, 
say [0, 1], and X to some compact subset of M'^, say [0, 1]*^. Our aim is to estimate 
s from the observations of the pairs (iVj, Xj)i<i<„. 

Let L^_(T X X, M) be the cone of integrable and non negative functions on (T x 
X, =^(g)=^) equipped with the product measure M = ji^Un where f„ = Z^ILi ^xi- 
In order to evaluate the risks of our estimators, we endow L]j_(T x X, M) with the 
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Hellinger-type distance H defined for u,v e L^^(T x X, M) by 

2H'^{u,v) = I (^/u{t,x) - ^/v{t,x)Y dM{t,x) 

Let (L^(T X X,M),d2) be the metric space of functions / on T x X sucii that 
belongs to L^(T x X, M). Given a suitable collection V of models (not necessarily 
linear spaces) and a non-negative application A on V satisfying 

E -"^^"^^ < 1' 

we build an estimator s whose risk E [i7^(s,s)] satisfies 
(1) CE[H\s,s)] < ug!^4{V~s,V) + 

where C is an universal positive contant, ^2 {■\/s, V) is the L^-distance between ^/s 
and V and nr]y is the metric dimension (in a suitable sense) of V. We shall use 
this inequality in order to derive risk bound for our estimator under smoothness or 
structural assumptions on the target function s. 

In the literature many attention have been paid to the problem of estimating the 
intensity of a Poisson process without covariates. With the L^-loss, Reynaud-Bouret 
(2003) dealt with the problem of model selection among a family of linear spaces 
V. Baraud and Birge (2009) used the Hellinger distance and considered the case 
where the sets V consist of piecewise constants functions on a partition of T. More 
general models were considered by Birgc (2007) allowing for V any subsets with finite 
metric dimensions (in a suitable sense). Much less is known when the intensity s 
depends on covariates. For model selection purpose, the only result we are aware 
of is due to Comte et al. (2011) where they considered the L^-loss and penalized 
projection estimators on linear spaces. Their approach requires that the intensity s 
be bounded from above by a quantity that needs to be either known or suitably 
estimated. Besides, they impose some restrictions on the family of linear spaces V 
so that their estimator possesses minimax properties over classes of functions which 
are smooth enough. 

Our approach is based on robust testing as developed in the papers of Birgc (2006) 
and Baraud (2010). We shall see that our estimator s possesses nice (adaptation and 
robustness) properties but suffers from the fact that its construction is numerically 
intractable. From this point of view, it inherits from the qualities and drawbacks 
of the T-estimators as developed by Birge (2006). We obtain an oracle inequality 
of the form (1) under very mild assumptions both on the intensity s and the family 
of models V. This allows us to derive risk bounds over a large range of Holderian 
spaces including irregular ones. Nevertheless, we shall also consider functions s 
defined on a subset T x X of a large dimensional linear space, say T x X = [0, 1]^+*^ 
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with k large, and it is well known that in such a situation, the minimax approach 
based on smoothness assumptions may lead to very slow rates of convergence. This 
phenomenon is known as the curse of dimensionality. In this case, an alternative 
approach is to assume that s belongs to classes ,^ of functions satisfying structural 
assumptions (such as the multiple index model, the generalized additive model, the 
multiplicative model . . . ) and for which faster rates of convergence can be achieved. 
Very recently, this approach was developed by Juditsky et al. (2009) (in the Gaussian 
white noise model) and by Baraud and Birgc (2011) (in more statistical settings). 
Nevertheless, unlike Juditsky et al. (2009) we shall not assume that s belongs to ^ 
but rather consider ^ as an approximating class for s. 

In the present paper, our point of view is closer to that developed in Baraud 
and Birge (2011). We shall use our new model selection theorem in conjunction 
with suitable families V of models in order to design an estimator ,s possessing good 
statistical properties with respect to many classes of functions of interest, including 
classes ^ = consisting of composites functions {t,x) i->- g[t,u{x)) and classes 
^ = consisting of products functions (t, x) i— t- g{t)u{x). In order to design the 
suitable family V with good approximations properties with respect to the elements 
of we shall either use the techniques developed in Baraud and Birgc (2011) when 
^ = for instance, but also some new ones when ^ = and for which the 
former approach by Baraud and Birge would lead to an extra (and unnecessary) 
logarithmic factor in the risk bound. When s(t,x) is of the form (or close to) 
g{t, u{x)) or g{t)u{x), where g and u are assumed to be smooth we shall prove that 
our estimator is fully adaptive with respect to the regularities of both g and u. 
We shall also consider structural assumptions on the functions g and u as well as 
parametric ones when t and x lie in a large dimensional space. Finally we shall 
look at the situation where, for all x G X, s(-,.x) belongs to a parametric class of 
functions with C M'^, which means that there exists some element fg(^x) G 
such that s{-,x) = fe{x){')j a-^d our aim is then to estimate the mapping x >-)• 9{x) 
by model selection. 

This paper is organized as follows. The general model selection theorem can be 
found in Section 2. In Section 3, we study the case where is a class of smooth 
or composite functions, and in Section 4 the case where is a class of product 
functions. The problem of estimating s when the intensity of each Poisson process 
Ni belongs to a same parametric model is dealt in Section 5. Section 6 is devoted 
to the proofs. 

Let us introduce some notations that will be used all along the paper. We set 
N* = N \ {0}, M* = M \ {0} and M^^. = M+ n M*. The numbers x A y and x V y stand 
for min(x, y) and max(x, y) respectively. For {E, S, v) a measured space, we denote 
by L^(£^, z/) the linear space of measurable functions / such that |/pdi^ < oo. 
When {E,u) = (T x X, M), the L^-distancc of this set is denoted by d2, and the 
norm by || • ||2. Alternatively, this distance (respectively this norm) is denoted by 
dt (respectively || • ||t) when {E^v) = (T,//), and by dx (respectively || • ||x) when 
{E,u) = (X, i^jj). The supremum norm of a bounded function / on a domain E 
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is denoted by ||/||oo = sup^.^^. and the space of all bounded functions on E 

by L°°(i?). For {E,d) a metric space, x & E and A C E, the distance between x 
and A is denoted by d{x,A) = mia^Ad{x,a). The closed ball centered at x e E 
with radius r is denoted by B{x,r). The cardinality of a finite set A is denoted 
by 1^1 . The set =^ is a generic notation for a family of functions of L^(T x X, M) 
of special interest. The notations C,C',C". . . are for the constants. The constants 
C,C',C". . . may change from line to line. 



2. A GENERAL MODEL SELECTION THEOREM 



2.1. Main result. Throughout this paper, a model F is a subset of L^(T x X, M) 
with bounded metric dimension, in the sense of Definition 6 of Birge (2006). We 
recall this definition below. 

Definition 2.1. Let V be a subset o/ L^(T x X, M) and Dy a right- continuous 
map from (0, +00) into [1/2, +00) such that Dv{r]) = o{jf) when rj — )■ +00. We 
say that V has a metric dimension bounded by Dy if for all 77 > 0, there exists 
Sviv) C L2(T X X, M) such that for all f e L,^{T x X, M), there exists g e Sv{rj) 
with d2{f,g) < rj and such that 

Vv? G l2(T X X,M), V.T > 2, \Svir])nBiip,x7])\ < cxp (L>y(r/)a;2) . 

Moreover, if one can choose Dy as a constant, we say that V has a finite metric 
dimension bounded by Dy. 

This notion is more general than the dimension for linear spaces since a linear 
space V with finite dimension (in the usual sense) has a finite metric dimension. 
Besides, if V is not reduced to {0} one can choose Dy = dimF, what we shall 
do along this paper. Other models of interest with bounded metric dimension will 
appear later in the paper. 

Given a collection of such subsets, our approach is based on model selection. We 
propose a selection rule based on robust testing in the spirit of the papers Birge 
(2006); Baraud (2010). The test and the selection rule which are mainly abstract 
are postponed to Section 6. The main result is the following. 

Theorem 2.1. LetY be an at most countable family of models V with bounded met- 
ric dimension Dy{-) and A be some mapping fromY mtoM_|_ such that'^Y^Y^~^^^^ ^ 
1. 

One can build an estimator s G L^(T x X, M) depending on {Ni, Xi)i<i<n, V and 
A such that 

A{V) 



(2) CE [H\s, s)] < mf^ (v^, V) + r)^ + 

where C is an universal positive constant and where 

W = mf{ry>0, ^<n^ 



n 
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Moreover, there exists a random function f G UyeY^ such that \/l = / V 0. 

The condition EyeV e"^^^^ < 1 can be interpreted as a (sub)probability on the 

collection V. The more complex the family V, the larger the weights A(y). When 
V consists of linear spaces V of finite dimensions Dy one can take r]y = Dy / n and 
hence (2) leads to 

CE [H\s,s)] < {Vs,V) + ^^^^^} . 

When one can choose A.{V) of order Dy, which means that the family V of models 

does not contains too many models per dimension, the estimator s achieves the best 
trade-off (up to a constant) between the approximation and the variance terms. 

In the remaining part of this paper, we shall consider subsets C L^(T x X, M) 
corresponding to various assumptions on ^/s (smoothness, structural, parametric 
assumptions . . . ). For such an we associate a collection and deduce from 
Theorem 2.1 a risk bound for the estimator s whenever ^/s belongs or is close to 
This bound takes the form 

(3) C'E [H\s, s)] < inf {4{V~s, /) + sAf)} 

where 

and we shall bound by above the term £jr(/). This upper bound will mainly depend 

on some properties of /, for example smoothness ones. In this case, this result says 
that if ^/s is irregular but sufficiently close to a smooth function /, the bound we 
get essentially corresponds to the one we would get for /. This can be interpreted 
as a robustness property. 

Sometimes, several assumptions on ^/s are plausible, and one does not know what 
class should be taken. A solution is to consider ^ a collection of such classes ^ 
and to use the proposition below to get an estimator whose risk satisfies (up to a 
remaining term) relation (3) simultaneously for all classes ^ G S^. 

Proposition 2.1. Lef^ he an at most countable collection of subsets o/L^(TxX, M) 
and A be some mapping on ^ into M+ such that X^^g^e"^^"^^ < 1. For any ^ E ^, 
let Vjr be a collection of models and be some mapping such that the assumptions 
of Theorem 2.1 hold. 

There exists an estimator s such that, for all ^ & ^, 

CE [H\s, s)] < mj {di(v^, /) + £^(/)} + 

where 



n 

and where C is an universal positive constant. 
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We illustrate below Theorem 2.1 with two preliminary examples of ^ . More 
general families ^ will be studied subsequently. 

2.2. Robustness with respect to an i.i.d assumption. In this section, X = 

{1, . . . ,n}, which means that we observe n independent Poisson processes A'j, 1 < 
i < n with respective intensity Si{-) = s{-,i) on T. Assuming that the A^^ are i.i.d 
amounts to assuming that ^/s belongs to 

^ = {/ G L2(T X X,M), f{;i) = f{;j), VI <i,j<n}, 

which means that the dependency with respect to the second variable can be dropped. 
Given now a family of models V for approximating functions of one variable in 
L^(T, |Lt), we deduce from Theorem 2.1 the following result. 

Proposition 2.2. Let V be an at most countable family of models V included into 
L^(T, ji) with bounded metric dimension -Dy(-) and A be some mapping from V into 

M+ such that Ei/ev^"'^^^^ ^ 1- 

There exists an estimator s G L,^(T, fj,) such that, for all f € L?{T, fj,), 

CE [H\s, ^)] < ^ E 4 /) + mf^ U{f,V) + v'v + ^], 

i=l ^ 

where C is an universal positive constant. 

If the Poisson processes Aj were i.i.d, the preceding proposition could be proved 
from a model selection theorem for a single Poisson process by considering the 
aggregated process Y17=i-^i- refer to the papers Baraud and Birge (2009); 

Baraud (2010); Birge (2007) for such theorems. However, our result still holds 
when the i.i.d assumption is slightly violated, since, in such a case, the risk of our 
estimator s cannot be severely deteriorated. This corresponds to some robustness 
with respect to this assumption. 

Under some smoothness or structural assumptions on /, and for a suitable choice 
of V, this bound can be specified in order to get rate of convergence in the same 
way as we shall do in Section 3. 

2.3. Time-dependent covariates. In this section, we consider the case where we 
have at hand n repairable systems such that the times of failure of each of them can 
be modelled by a Poisson process. For each system, we assume that the probability 
of failure at time t depends only on t and the values of some measurements {x{u))u<t 
which have been recorded up to time t. This means that for each i G {1, . . . n}, the 
intensity Sj of the Poisson process Aj representing the times of failure of system i, 
is of the form Si{t) = s {t, {xi{u))u<t)- In this section, we therefore consider the case 
where T = R+, and the covariate x is a function from T into (and hence X is the 
set on functions from into M'^). A convenient simplification to model this kind 
of situation is to assume that Si{t) actually depends on t and Xi{t) and not on the 
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past values {xi{u))u<:t- Since the problem amounts then to estimating a function on 
M+ X M''', wc introduce the class 

^ = |/ G l2(T X X,M), 3g G L°°(T x R''), \/{t,x) G T x X, f{t,x) = g{t,x{t))^ . 

The preceding simplification holds when ^/s does belong to this class. We assume 
in this section that y/s is close to (but not necessarily an element of) which 
corresponds to some robustness with respect to this simplification. We deduce from 
Theorem 2.1 the following result. 

Proposition 2.3. Assume that fj, is a probability measure on T. Let Y be a family 
of finite dimensional linear subspaces o/L°°(T x W'^) and A > 1 be some mapping 

on V such that ^ygyC"^*^^) < 1. There exists an estimator s, such that, for all 
f e^, andge L~(T x R^) such that f is of the form f{t, x) = g{t, x{t)), 

(4) CE [H\s, s)] < dl (Vi, /) + inf^ {^^(ff. ^) + ^^"^ ^ + ^^^^ | , 

where C is an universal positive constant. 

If \/s did belong to the right-hand side of inequality (4) corresponds to the 
bound we would get if we could estimate a function g with 1 + k variables by model 
selection. This bound can be specified under some assumptions on g in the same 
way as we shall do in the next section. 



3. Some classical classes ^. 

In this section, our aim is to control the quantity ej?(/) appearing in (3) for 
various classes of interest ^. Throughout this section, we shall assume that /x is the 
Lebesgue measure. 

3.1. Classes of smooth functions. Let I = 11^=1 where the Ij are intervals of 

M and a = /3 + p G (M:^)'^' with p G N'' and /3 g]0, I]''. A function / belongs to 
the Holder class if there exists L{f) G R+ such that for all (xi, . . . , x^) G I 

and all 1 < j < A;, the functions fj{x) = f{xi, . . . ,Xj-i,x,Xj+i, . . . ,Xk) admit a 
derivative of order pj satisfying 

f^'''\x)-f^'''\y)\<L{f)\x-yf^ yx,yelj. 

The class 7{"(I) is said to be isotropic when the aj are all equal, and anisotropic 
otherwise, in which case a given by = Yli=i corresponds to the average 
smoothness of a function / in We define the class of Holderian functions on 

I by 

^(I)= U 
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Assuming that ^/s is Holderian corresponds thus to the choice = "H (T x X). 
Anisotropic classes of smoothness are of particular interest in our context since the 
function s depends on variables t and x that may play very different roles. 

Families of linear spaces possessing good approximation properties with respect to 
the elements of ^ can be found in the literature. We refer to the results of Dahmcn 
et al. (1980). We may use these linear spaces (models) to approximate the elements 
of and deduce from Theorem 2.1 the following result. 

Corollary 3.1. Let us assume that T X X = [0, l]'^'. There exists an estimator s 
such that, for all f ^Ti ([0, l]'^), the inequality below holds: 



where a G (M^)'^ is such that f G H"([0, 1]*^) and where C > depends only on k 
and maxi<j<fe aj. 

Remark that the risk bound given by inequality (5) holds without any restriction 
on ex. Such a generality can be obtained since our model selection theorem is valid 
for any collection V of finite dimensional linear spaces. Some restrictions on the di- 
mensionality of the linear spaces F G V (as in Comte et al. (2011)) would prevent us 
to get this rate of convergence for the Holder classes ([0, 1]^) when mini<j<fc aj 
is too small. 

The preceding risk bound is quite satisfactory if k is small but becomes worse 
when k increases. We shall therefore consider other types of classes in the next 
section in order to avoid this curse of dimensionality. 

3.2. Approximation by composite functions. An alternative to the smoothness 
assumptions, is to consider structural ones. In this section we focus on the case where 
■s/s is equal (or at least close to) a composite function of the form gou where 5 is a 
continuous mapping from W into M and u maps T x X into a compact subset of W . 

We shall restrict oursclf to two cases of composite functions that arc of particular 
interest for the problem we are considering. Nevertheless, more general composite 
functions gou could be handled in a similar way as in Baraud and Birge (2011) 
leading to an analogue of their Theorem 1. 

All along this section we shall use the following notation. For g G H"(I) where 
I = 11^=1 ^6 denote by H^Ha any number such that, for all [xi, . . . ,Xk) G I, the 
function gj{x) = g{xi, . . . , Xj-i,x, Xj+i, . . . , Xk) satisfies 





yx,yelj, \gj{x) - gj{y)\ < \\g\\o,\x - y\ 
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3.2.1. A multiple index model for covariate effect. In this section, we fix Z € N* and 
we consider the case where ^/s is equal (or close to) an element of 

= {f Gh'^iT xX,M),y{t,x) £T xX, f{t,x) = git, <9i,x >,...,< 9i,x>), 
g e H{T X [-1, 1]^), yi<£<l, Bee B{0, 1)} , 

where T = [0, 1]''! and where X = i3(0, 1) = {x G M*^^ A ^ 1} is the unit ball 

of M'^^ . This class is related to the multiple index model and is introduced to 
reduce the curse of dimensionality when ki is large. 

Corollary 3.2. There exists an estimator s such that, for all ex. G (]R;^)'^i+', 9i,. . . ,9i G 
H(0, 1), g G 7{"([0, 1]*^! X [-1, 1]') and f e of the form 

fit, x)=git,<ei,x >,..., <Bi,x>), 
the following inequality holds: 

CTE (s, s)] < di(v^,/) + (L(c/))2°+'=i+' n 

Inn V (ln||5||2/c-i) V 1 
n 

where C > depends only on I, k\, and ex. 

In the case where -^/s belongs to ^i, s/s is an Holderian function with ki + k2 
variables and one could apply Corollary 3.1 to estimate the parameter s. However, it 
is easy to see that the preceding bound is much faster that the one we could get under 
smoothness assumption only. As soon as I < k2, the larger term of (6) corresponds 
to the estimation rate of an Holderian function g with only ki + l variables and not 
ki + k2. 

By using Proposition 2.1 with ^ = {^i, I G N*}, we derive an estimator s whose 
risk satisfies inequality (6) simultaneously for all I G N*. 

3.2.2. An Accelerated Failure-Time model. The Accelerated Failure-Time model is a 
model in which the covariates change the time scale of some reference Poisson process 
N on M_|_. The Poisson process N typically corresponds of the times of failure of a 
repairable system in standard conditions. Here, we have at hand n systems, and we 
assume that for each i G {1, . . . , n}, and each t G M+, the average number of failures 
of system i, before time t, E [A^i([0, t])] is of the form E [iVi([0, t])] = E [A^ ([0, tuixi)])] 
where u is a non-negative function on X. This means that if uixi) = 1, system i 
runs in normal condition and if uixi) < 1 (respectively n(,Tj) > 1) the covariates 
extend (respectively reduce) the life-time of system i. Throughout this section, we 
assume that we observe each system i G {1, . . . , n} on the interval of time T = [0, 1]. 
If the model is exact, and if g^ denotes the underlying intensity of N, the above 
relation gives that s satisfies \/ s(t, x) = vix)gitv'^ix)), for all (t, x) G [0,1] x X 
where vix) = \Juix). As usual, we shall not assume that the model is exact, but 
rather that y/s is close to a function / of the form 

(7) fit, x) = vix)gitv\x)) V(t, x) G [0, 1] x X, 
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where g and v are smooth unknown functions. More precisely, we consider that 
X = [0, l]''^ and we introduce the class of functions ^ given by 

^ = {/GL2([0,l]x[0,l]'=^M), 3i;Gl«([0,l]'=^), 35€l«([0,||i;||^]), 

V(t,ar) G [0,1] X [Q,lf\f{t,x) = v{x)g{tv\x))] . 
The result is the following. 

Corollary 3.3. There exists an estimators such that for all f e ^ of the form (7) 
with v£Hl^ ([0, Ip), v'^ G W ([0, if^), geU"' ([0, ||u||^]) for some a G M* , /3 G 
(1R^)'=2^ ^ g (M^)*^^, the following inequality holds: 



n 



, / T t 2\o:^l\ 27(aAl 



/'lnnVln(||t;||^"'''^+'||(7||„) VI 



2T'(aAl) 
2^{aAl)+fc2 



n 



InnVlnffl V llrf^"^^^) ||r||,^||9|I„') VI 
+ {\H]i^'"L{g)) + ^ 

where C > depends only on k2, a, (3 and 7. 

Let us make some comments on the inequality above. If -s/s does belong to ^ 
and if one is only interested by the rate of convergence, we can bound the risk of s 

by 



(8) C'E [H"^ (s, s)] < max 



2a 20 2-y(aAl) 

1\2q + 1 /lnn\23+fc2 /Inn \ 2T(aAl) + fe2 



n / \ n I \ n 



for some constant C > depending on s. The first term corresponds to the rate 
of convergence for estimating g only. If g is at least Lipschitz, the two other terms 
correspond to the rate of estimation of v and v'^ respectively (up to the logarithmic 
term). Note that it is always possible to choose 7 = /3, since is at least as regular 
as V. In which case, the rate becomes 



C'E [H^ (s, s)] < max 



2a 20(aAl) 
l\2a+l /lnn\ 2/3(aAl)+fc2 



n / \ n 



Nevertheless, in some situations, v"^ may be more regular than v (think for instance 
to v(x) = \fx on X = [0, 1]) and hence, if 7 is large enough (7 > ^(a A the 
rate we get becomes 



C'E {H'^ (s, s)] < max 




Inn \ 2/3+'=2 
n 
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It is interesting to compare the rate (8) to the one we would obtain under the pure 
smoothness assumption on but ignoring that yfs is of the form (7). To do so, we 
need to specify the regularity of y^, knowing that of v and g. This is the purpose 
of the following lemma. For the sake of simplicity A;2 = 1, a g]0, 1] and P g]0, 1]. 

Lemma 3.1. Let a g]0, 1], /3 g]0, 1], v eV.^ ([0, 1]) and geTi"' ([0, Then, 
the function f defined by f{t,x) = v'^{x)g{tv{x)) belongs to ^("'"^^[0, l]^). 

Moreover, there exists v G Ti^ ([0)1]) ^^^^ 9 £ ([0, ||?^||^]) such that, for all 
a' g]0, 1] and all (3' g]0, 1], the preceding function f{t,x) = v'^{x)g{tv{x)) belongs to 
^(a',a'/3')([o, i]2) if and only if a' < a and /3' < 

If A;2 = 1 and if ^/s is of the form (7) with i; G "H^ ([0, 1]) and g^Ti"^ ([0, 
then ^/s is Holderian with regularity (a, a/3) on [0, 1]^, and this regularity cannot be 
improved in general except in some particular situations. Under some smoothness 
assumption, the rate of estimation we would get is n' 

-2a/3/(2a/3+^+i)_ This rate being 
always slower than the rate we obtain under the structural assumption on y^. 

4. Families ^ of product functions. 

A common way to model that the covariates influence the number of failures 
of n systems is to assume that, for each i G the intensity of Ni, is 

of the form s{t, Xj) = u(t)v{xi) where u is an unknown density function on T, 
and V some unknown function from X into M+. This means, that in average, the 
number of failures of system i, E[iVj(T)] = v{xi), depends on Xi through v only, 
and conditionally to Ni(T) = ki > 0, the times of failure are distributed along T 
independently of Xi, but accordingly to the density u. 

We shall therefore consider the class ^ defined by 

(9) ^ = {kViV2, K>0, {vuV2) G L2(T, fi) X L2(X, Un), \\vi lit = ||t;2||x = 1} , 

which amounts to assuming that s is of the form (or close to) a product function 
u{t)v{x) with u = vf and v = k^v\. 

In this section, we introduce collections of models Vi and V2 in order to approxi- 
mate the components v\ and f 2 separately. For each V\ G Vi to approximate v\ and 
V2 G V2 to approximate 772, we approximate v\V2 by the model V\ ® Vi defined by 

(10) o V2 = {v\V2, (yx,V2) G ¥l X V2} . 
The metric dimension oiV\®V2 is controlled as follows. 

Lemma 4.1. Let V\ and V2 be a finite dimensional linear space of Lp{T,fi) and 
L^(X, f„) respectively. The set V10V2 defined by (10) has a finite metric dimension 
bounded by 

Dvi®V2 = 1-4 (dimFi + dimy2 + 1). 
By using Theorem 2.1, we prove the following result. 
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Proposition 4.1. LetYi andY2 be an at most countable collection of finite dimen- 
sional linear spaces o/L^(T, //) andL^(X, f„) respectively. Moreover, fori G {1,2}, 
let Aj be some mapping on Vj with values into [1, +oo) such that X^y.gy e~^'^^») < 
1. 

There exists an estim,ator s such that, for all KV1V2 € where ^ is defined 
by (9), the following inequality holds: 

C¥.[H\s,s)\ < Kvrv2) + ^inf f^K^dl{v^,V^) + ^'"^ ^^^^^^ | 

^ • f / 2,2. ^dimF2 + A2(F2) 

+ inf <K d^[V2, V2) + 



V26V2 ' 71 

where C is an universal positive contant. Furthermore, s/S belongs to ^ . 

Apart for the term (B2fy^J~s.,KV\V2) which corresponds to some robustness with 
respect to the assumption y's G the risk bound we get corresponds to the one 
we would get if we could apply a model selection theorem on the components v\ and 
V2 separately. 

4.1. Smoothness assumptions on v\ and V2- Let us illustrate this proposition 
by setting T = [0, , X = [0, l]'^^^ ^ ^he Lebesgue measure and 

(11) ^ = [kviV2, K>(},vie H([0, l]'^!), ||t;i||t = l,V2e H([0, l]'^^), \\v2\U = 1} . 

Corollary 4.1. There exists an estimator s such that, for all KV1V2 G where ^ 
is defined by (11), the following inequality holds: 

CE[i?2(s,s)] < (ii(^,Kz;iU2) + K'"+'=ii(t;i)2«+fcin 

2k2 2fc2 _ 2/3 

+ k2/3+'=2 L (^2) 2'3+fe2 n 2/3+fc2 + 

where a G (M*)^S is suc/i i/iai G ^"([0,l]^i), w/iere /3 G (RX)''^ is such that 
V2 G H^([0, l]'^^^^ where C > depends only on ki, k2, maxi<j<fcj Oj, and 
maxi<j<fc2 ft. 

In particular, if s is a product function of the form ^/s = KV1V2 for vi G 'H"([0, l]'^^), 
and V2 G ^''([O, l]'^^)^ ^ Holderian with regularity (a, /3) on [0, Ij'^i+'^z However, 
the rate given by the corollary above is always faster than the one we would get by 
Corollary 3.1 under smoothness assumption only. 

4.2. Mixing smoothness and structural assumptions. In Section 3.2.1 we have 

studied the case where s{t, x) is of the form s{t, x) = g{t, < 9i,x >,...,< Oi,x >) 
where g and 9i, . . . ,9i are unknown. We now assume that the covariate effect is 
multiplicative, and we approximate ^/s by the class of functions of defined by 

^ = \^KViV2,n>0,vien{[0,lp),ei,...,9ieB{0,l),gen{[-l,lf), 

(12) Va; G X, V2{x) = g {< 9i,x >,...,< 9i,x >) , \\vi\\t = \\v2\U = 1} 
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where we have chosen T = [0, 1]^^, jj, the Lebesgue measure and X = B{0, 1) = {x G 
^''^'i Et=i xf <l} the unit ball of M*^^. 

Corollary 4.2. There exists an estimMor s such that, for all k,v\V2 € where 
^ is defined by (12) the following inequality holds: 

21 21 20 In (Ac^||o||iA;^^) V Inn V 1 

+n'^f>+i L{g)^i3+i n 2/3+' H ^ p_^j_ 

n 

where a G (R^)''! , f3 G {RXY are such that vi G 7^"([0, 1]''!), g G n'^{[-l, 1]') m^/i 
■^'2(2;) = 5' (< ^1, X >, . . . , < 0/, a; >) and where C > depends only on ki, I, a and 
/3. 

When y/s belongs to the class J?, the risk bound of the inequality above corre- 
sponds to the one we would get if we could estimate the functions vi and g sepa- 
rately. This risk bound is then better than the one we would get under smoothness 
assumptions only, or under the structural assumption of Section 3.2.1. 

4.3. A model selection theorem for parametric assumptions. In this section, 

we introduce a parametric class ^ of the form 

^ = jaufcue, a > 0, 6 G /, 6* G M''^ } , 

where I is an interval of M, {ub)b^i is a family of functions and v$ is defined by 
vg{x) = exp (< X, 9 >) for ,t G X = {x G R'''^,Ylili A < 1}, the unit ball of R^2. For 
each i G {1, . . . ra}, the intensity of Ni, s{-,Xi), is thus assumed to be proportional to 
an element of (or an element close to) some reference parametric model {ti^, 5 G /}. 
Let us give 3 examples of such models. 

The Duane model (also known as the Power Law Processes) amounts to assuming 
that Ub{t) = t^ with b G (—1/2, +00) and t £T = (0,1]. Proposed first in Duane 
(1964), this is one of the most popular models used in reliability. Indeed, although 
the intensity is simple, different situations can be modelled by this model. For 
example, if 6 = each Ni obeys to an homogeneous Poisson process, whereas if 
6 > (respectively 6 < 0) the reliability of each system improves (respectively 
reduces) with time. In software reliability, we can cite the Goel-Okumoto model 
of Goel and Okumoto (1979) and the S'-Shaped model of Yamada et al. (1983). The 
former amounts to assuming that Uh{t) is of the form ui){t) = e~^^ whereas the latter 
corresponds to Ub{t) = ^/te^^^ where b G [0, +00) and t G T = [0, +00). 

A standard way to estimate s is to maximize the likelihood, as in Lawless (1987). 
However, our method is more general since we can (by Proposition 2.1) deal simulta- 
neously with several (parametric and non parametric) assumptions. Moreover, the 
maximum likelihood estimator is not robust, which can be seen by adapting Section 
2.3 of Birge (2006) to Poisson processes. 
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To estimate s with our method, we have to consider the following assumption on 

the model {u^, b G I}. 

Assumption 4.1 The family {uh)hQi is a family of non vanishing func- 

tions o/L^(T, /x) indexed by an interval I of the form (605+00)- Moreover, there 
exists p,p>0 two non increasing functions on I, such that for all b, b' G /, 



p{b\/b') \b-b'\ < 



m\\t 



<p{bAb') \b-b'\ 



The purpose of the lemma below is to show that the assumption above holds for 
the Duane, Goel-Okumoto and S-Shaped models. 

Lemma 4.2. Let I = (—1/2, +00), T = (0,1], p the Lebesgue measure, and for 
b G I, ui,{t) = t^. Assumption 4-1 is satisfied with 

Vix>-l/2, p(^)=p(^) = ^-i^. 

Let L = (0, +00), T = [0, +00) , p the Lebesgue measure. A; G N, and for b E I, 
Ubit) = t^/'^e~^* . Assumption 4-1 is satisfied with 



Vti > 0, p{u) 



2u 



and p{u) 



2u 



To write our results, all along this section, 
norm of M^^ 



denotes the standard Euclidean 



k2 



i=l 



and d the distance induced by this norm. 

Proposition 4.2. Let {ub)b&i be a family such that Assumption 4-1 holds. There 
exist a > 0, b E I and G , such that the estimator s = {aupj^)"^ satisfies, for all 
a>0,beL,9e'R'''^, and f E ^ of the form f{t, x) = aub{t)ve{x), 

(13) [H\s, s)] < dl iV-s, f) + + ^ 

where C is an universal positive constant and where C depends only on p, p, bo and 
b. More precisely, 

b-bo 



C = In 



1 V p 60 



6-60 + 1 



+ |ln(lAp(l + 6))| + |ln(6-6o 



Under parametric assumptions on s, this result says that the rate of convergence 

of s is of order n~^, which is quite satisfying when n is large, but may be inadequate 
in a non-asymptotic point of view. Indeed, the second term of the right-hand side 
of inequality (13) may be large especially when ^2 is large, says larger than n. To 
avoid this difficulty, a convenient assumption amounts to considering that 6 is sparse, 
which means that 9 is close to some (unknown) linear subspace W of M'^^ with dim W 
small. We generalize afterwards Proposition 4.2 to take account of this situation. 
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Proposition 4.3. Let (n^jbg/ be a family such that Assumption 4-1 holds. LetW be 
an at most countable family of linear subspaces of M^'^ and let A be a non-negative 
map on W such that YlweW — ^■ 

There exist a > 0, b e I and 6 G M.'^'^, such that the estimator s = (ait^Vg)^ 
satisfies, for all a > 0, b G 1 , 6 E W'^'^, and f ^ ^ of the form f{t, x) = auf,{t)vo{x), 

where C is an universal positive constant and where C is given by Proposition 4. 2. 

For illustration purpose, let us make explicit the constant C for the Duane model, 
and let us therefore assume that there exist some unknown parameters a, 6, d such 
that s is of the form \/s{t,x) = at^ exp (< 0,x >). We derive from Proposition 4.2 
an estimator whose risk satisfies 

(14) OE [H\., 5)] < (lV|l^ll)fa + |ln(26+l)| 

where C is an universal positive constant. However, if for instance k2 is large and 

if most of the components of 9 are small or null, the preceding proposition can be 
used to improve substantially the risk of our estimators. For simplicity, assume that 

K = \{i^{i,...,k2},ei^Q}\ 

is small. We then define M the set of all subsets of {1, . . . , A;2}, and for each m G A^, 
the set 

Wm = {(2/1, . . .,yk,), Vz ^ m, = 0} C 
We apply Proposition 4.3 with 

k2 



W = {Wm, meM} and Vm G M, A{Wm) = 1 + |m| + In 

This leads to an estimator s such that 

(1 V In/C2 V \\e\\){l V fc*) + I ln(26+ 1)1 



CE [H'^{s,s)\ < 



n 

which improves inequality (14) when ki, is small and k2 large. 



5. Parametric models. 



In this section, we consider the situation where the intensity of each process Ni 
belongs (or is close) to a same parametric model. Let us define a subset of M^, 
and let us denote hy F = {fe, G Q} a class of functions of L^(T, ^). Our aim is 
to estimate s when, for each i G {1, . . .n}, the square root of the intensity of the 
Poisson process Ni, \/ s{-, Xi), is (or is close to) an element of J^. We introduce thus 
the class of functions ^ defined by 

^ = {{t, x) i-^ fu{x)it)j where w is a map on X with values into 6} . 
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For instance, if T corresponds to the Duane model (see Section 4.3), we assume 
that there exist two functions a and 6>— l/2onX such that s{t, x) is close to 
a function of the form a{x)t^^^\ The class is then the set of all functions of this 
form. For more general classes we have to consider the following assumption. 

Assumption 5.1. There exists K. an at most countable collection of closed convex 

subsets of M*^ such that Q = UKeK^^- Moreover, for each K € /C, there exist 
ol{K) = {aj{K))i<j<k G (0, 1]*^ and R{K) = {Rj{K))i<j<k e i^X)'" such that 

k 

(15) ye,e' e K, \\fe - fe'h <Y.^^(^) l^i- - ^il"'^""^- 

J=l 

Typically, relation (15) does not hold for iiT = G but rather for some subsets K 
of G. In Sections 5.1, 5.2 and 5.3, we give some examples of processes satisfying the 
assumption above. We assume in these sections that /x is the Lebesgue measure. 

5.1. The Duane model. As explained previously, this model is defined by = 
{fg, eeQ} where G = M x (-1/2, +oo), and where fe is of the form fg(t) = Oit^^ 
for = (eii, 6*2) G G and f G T = (0, 1]. The class ^ is thus 

^ =\{t,x)^ a{x)t^^''^ , where a : X ^ M and 6 : X ^ (- 1/2, +oo) | . 

In this situation, the lemma below gives a collection /C such that Assumption 5.1 
holds. 

Lemma 5.1. For all ii,i2 G N*, let Ki^^i^ = [— x [— ^ + i-,+oo). Assump- 
tion 5.1 is satisfied for the Duane model with K, = {Ki^^i^, ii,i2 G N*}, and with 

« {^11,12) = (1> 1) O'nd R (i^ji,i2) = (4''^' V2iiZ2^^) • 

5.2. The Goel-Okumoto and >S-Shaped models. As in Section 4.3, we study 
these two models simultaneously. For A; G N* and = (^1,^2) G G = M+ x we 
set fe{t) = \fe^-^e-^^ for t G T = (0, +00) and 

(16) JF=\t^ Vdit''-'^e-^^\ (01,62) G M+ X } . 
The class ^ is thus 

^ = |(i, x) y'a(x)t^-%-^, where a : X M+ and 6 : X ^ M* | . 

Lemma 5.2. For all ii,i2 G N*, let Ki^^i^ = [0,ii] x [i,+oo). Assumption 5.1 is 
satisfied for the class T defined by (16) with K = {Ki-^^i^., ii,Z2 G N*}, and with 



OL 



{Ki,,i,) = (1/2, 1/2) and R(i^n,i,) = (^^ {k - l)\i^^, ^/kU^^ . 



MODEL SELECTION FOR POISSON PROCESSES WITH COVARIATES 



17 



5.3. Translations and dilatations from a reference model. Let (phea function 
defined on an interval / of M. We assume that T is an interval and we define J2 and 
J3 two intervals such that, for all t eT, 62 & J2 and 63 G J3, td2 — O3 belongs to /. 
In this section, we study the situation in which the class ^ gathers the functions of 
the form 

{t, x) !-)• ai{x)(p {ta2{x) — a3{x)) 
where ai, 02, 03 are real- valued functions on X such that 02 (X) C J2 and 03 (X) C J3. 
This class corresponds to several parametric models T. For instance the Musa- 
Okumoto model (see Musa and Okumoto (1984)) corresponds to (p{t) = l/\/l + t 
and the Cox Lewis processes (see Cox and Lewis (1966)) to ip{t) = e*. The 
case where there exist three constant functions oi, 02, 03 such that \/ s{t, x) = 
anp {ta2{x) — a^i^x)) has been dealt in several papers. We refer for instance to Dabye 
(1993), Aubry and Dabye (2001) and the references therein for estimating s by using 
a maximum likelihood estimator or a minimum distance estimator. 

We consider the following assumption on 99. 

Assumption 5.2. For all 62 G J2 and 63 G J3, the function 1 1— )■ ip{t92 — Os) belongs 
to L^(T, )u). Moreover, there exists Q2, Q3 > and < /5,7 < 1, such that for all 
62, 02 G J2 and all 63, 9'^ G J3, 

I {t92 -93)-^ {t9'2 - 9'3)f d/x(i) < qI \92 - 9'2f + qI\93 - 9'3\^\ 

This assumption is satisfied in particular if T is bounded, and if ip admits a 
bounded differential on / \ /' where /' is at most countable. 

Lemma 5.3. Assume that Assumption 5.2 holds. Let O = M x J2 x J3 and for 

= ((9i, 6*2, 6*3) e @ and t e T, let fe{t) = 9np{t92 - 6*3). We then define for 

1 G N*, Ki = [—i,i] X J2 X J3. Assumption 5.1 holds with K, = {Ki, i G N*}, 
a{Ki) = (l,/3,7) and 

K{Ki) = (V2 sup ( [ ip\92t - 93) d^l{t) \ ' , V2iQ2, V2i^3 ) • 

5.4. A model selection theorem. The main theorem of this section is the follow- 
ing. 

Theorem 5.1. Assume that Assumption 5.1 holds. Let Wi, . . . , be k families of 
finite dimensional linear spaces ofLp(K, Vn)- Let A/c be some non-negative mapping 
on K. such that YliKeK^~^'^^^^ — ^' o.'nd for each 1 < j < k, Aj be a non-negative 
mapping on Wj such that XliVjgWj e"^^^^^') < 1. 

One can build an estimator s such that for all K G IC, all map u from X with 
values into K, and f E ^ of the form f{t, x) = fu(x){t)> 

k 

C¥.[H\srs)\<dl{^s,f) + Y.e^{u^) + ^^, 
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where £j{uj) is defined by 
where 

Tu,j{n) = 1 + In n + ln{lV Rj{K)) +ln(l V ||tij||x), 
and where C > depends only on k and the maps ai, . . . , a^. 

Roughly speaking, this result says that the risk bound we get when ^/s is of 
the form \/ s{t. x) = fu(x){t)- corresponds to the one we would get if we could 
apply a model selection theorem on the components ui,...,Uk separately. As in 
Section 3, each term £j{uj) can be controlled under some structural or smoothness 
assumptions on Uj. For instance, if X = [0, 1]*^^ and if Uj is assumed to belong to 
the class = '^([0,1]*^^), by a similar argument as the one used in the proof of 
Corollary 3.1, £j{uj) is such that 

where Pj is such that uj G H^^{[0, 1]'^^) and where Cj > depends only on k2 and f3j. 

In particular, if aj{K) = 1 and if n is large, £j{uj) is of order {lnn/n)'^^J^^'^^^^^^\ 
Apart from the logarithmic factor, this corresponds to the estimation rate of an 
Holderian function on [0, l]'^^. The corollary below illustrates this result in the 
framework of the Duane model. 

Corollary 5.1. One can build an estimator s such that, for all a G (M^)'^^, /3 G 
(RX)'"'', for all a G n°'i[0,l]''^) , b G ^^([0, l]'^^) satisfying b > -1/2, and for a 
function f of the form f{t, x) = a{x)t^^^\ 

( iVllalloo A ^,,,4^ /lnn\5^ 



lAinf^g[0^ip2(26(^) + l)3^ 
„1 Vlnn 



n 



n 

where C > depends on k2, maxi<j<fcaj, maxi<j<fc/3j, and where C depends on 
L{a), L{b), a, P, \\a\\oo, \\b\\oo and iiai^^^Q ^k2{2b{x) + 1). 

5.5. Change point detection. In the case where the intensity Sj of each iVj is of 
the form y/sjJJ) = f0^{t), a natural way to control the risk of our estimator s is to 
consider some assumptions on the map i ^ Gi. This problem amounts to choosing 
suitable collections Wi, . . . , to approximate functions on X = {1, . . . , ra}. 
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In this section, we focus on the case where the map i i— )■ 0^ is piecewise constant 
with a smah number of jumps, and we assume that n > 2. We introduce the set V 
of partitions of {1, . . . , n} into intervals and aim at estimating s when it is of the 
form 

(17) V(t,OeTx{l,...,n}, ,/^) = Y,fermi{i), 

leP 

where P € 7-" is an unknown partition and {6i)i^p an unknown family of elements 

of e. 

We define for each partition P gV, the linear space of piecewise constants 



and apply Theorem 5.1 with the collections and maps defined by 

/ n — 1 

Vi G {1, . . . , k}, Wj = {Wp, Per} and Aj{Wp) = 1 + |P| + In l^^^^ _ ^ 
This leads to the result below. 

Corollary 5.2. Assume that Assumption 5.1 and relation (17) hold. Let A)c be a 
non-negative map on K, such that Y^KeK^~^^^^ — ^■ 
One can build an estimator s such that, 

CE[H\s,s)]<\P\ '^^^''\ ^^, 
'- n n 

where K is such that Oj £ K for all I & P, where C > depends only on k and 

ai, . . . , ak, and where C is given by 

C' = l+ sup (ln(l + Pj(i^))) + sup(ln(l + ||6>/||oo)), 
i<j<k leP 

where ||0/||oo = supi<j<fe \{9i] 



For illustration purpose, in the context of the Duane model (see Section 5.1), 

there exist ai . . . , a„ G R, and 6i, . . . , 6„ G (—1/2, +oo) such that \/~Si(t) = ait''^ 
whatever t G (0, 1]. The preceding corollary provides then an estimator s such that 

CE [H\s, s)] < (1 + ri + r2) ^^J^l±^ 

where ri and r2 are the numbers of jumps of the maps i^ ai and z i-> 6j respectively, 
where C is an universal positive constant, and where C depends on supi<j<„ |ai|, 

supi<j<„ \bi\ and infi<j<„(26j + 1). 

The preceding collections Wi , . . . , can also be used to approximate the map 
i^ 9i under others assumptions such as smoothness ones. For instance, an approx- 
imation theorem for monotone functions can be found in Baraud and Birge (2009) 
and can be used to deal with the situation where some components of the map 
i ^ Oi are monotone. 
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6. Proofs 

6.1. Proofs of Section 2. 

6.1.1. Proof of Theorem 2.1. Let us introduce the following measure N on the mea- 
surable space (T X X, T ® 3^) 

\JAeT(^3t:, N{A) = -Y,{Ni(^5:ci){A). 

" i=i 

This measure is related to M (defined in Section 1) by the relation 

G r ® =r , E [N{A)] = I s{u) dM{u). 

J A 

The problem of estimating such a s from two observed measures {N, M) has been 
considered in Baraud and Birge (2009); Baraud (2010). As explained in the intro- 
duction, our approach is based on robust testing, and we borrow the test of Baraud 
(2010). This test is a function T on L^(T x X, M) x L^T x X, M) with values into 
M. For two functions si and S2 of L5|_(T x X, M), it is defined by 

T(.i,.2) = ^E/ + ^^(^' ^-) " (y^^ - 7^^) d^t) 

i=l 

+^±f ^^ ;^^'"-^"^^^^^'" ^div.(t) 

v2n^JT ^/sl{t,Xi) + S2{t,Xi) 
1 " /■ 

~2n^ / (s2(t,a;i) - si(t,a;i)) d//(t). 

Given two functions si and S2 this test returns the one deemed as being the one the 
closest to s. If T{si, S2) > 0, we prefer S2 to si and if T(si, S2) < we prefer si to 
32- If r(si, S2) = we prefer arbitrary si or 82- 

We shall derive a T-estimator s from this test by using the device of Birge (2006). 
In order to do so, wc have to verify that this test is robust. This is the purpose of 
the lemma below, whose proof is postponed to Section 6.1.2. 

Lemma 6.1. There exist kq > and a positive function a on {hq,+oo), such that 
for all K > kq, for all x G M, and all si, S2 G LY(T x X, M) satisfying kH{s, si) < 

H{si,S2), 

P [T{si,S2) >x]< exp [-na(K) (ij2(si, S2) + x)] . 

We now derive from the collection V a family of D-models (in sense of Definition 4 
of Birge (2006)) for the metric space (]L^(T x X), M) by the lemma below. 

Lemma 6.2. For all?] > 0, there exists an at most countable collection {Ty^rj), V G 
V}, such that each Ty{ri) is a D-model of the metric space (L^(T x %,M),H) with 
parameters r], 63Dv{'r]/2) and 1. Moreover, 

(18) H{s,Tv{v))<2V2{d2{V~s,V)+ri), 
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and for all f € Tv{r]), there exists g E V such that \fj = gV 0. 



21 



Proof. Thanks to Proposition 7 of Birge (2006), we can define for each V gY and 
r/ > a D-model S'y{r]) C V of L2(T x X,M) with parameters r/, 7L>y(r//2) and 1, 
such that 

V/GF, d2{f,S'y{v))<V- 

We apply for each V eY, Proposition 12 of Birge (2006), with T = S'y{r]), Mq = 
L^(T X X, M), M' = L2(T X X, M) and tt defined by 7f(/) = / V 0. This gives a 
subset S'{,{r]) C (T x X,M) such that 

V/ G L2(T X X, M), Vx > 2, \Sv{ri) n H(/, xry)! < exp (63Dy (r/) x^) 

where ^(/, xrj) is the ball centered at / with radius xrj of the metric space (L^(T x 
X,M),d2), and such that 

V/ € L^(T X X,M), d2{f,S'^{r])) < 4^2(7, (77)). 
The lemma holds with Tv{v) = {/^ / G -5y(??)}- □ 

With no loss of generality, we can assume that Dy is non-increasing and that 
a(2Ko) < 1 where a and are given in Lemma 6.1. Let us set 



/ 
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For all r] > 2ryy, Ty{ri) is a Z)-model with parameters 77, 63£>y(77y) and 1. It 
follows from Theorem 5 of Birge (2006) (applied with M=Y, S = VJv&Tv{2r]'y V 
v/21n-i(a(2Ko))-iA(F)), Dm = 63L>y (r?y), = (27?^)^ V 21n-i(a(2Ko))-^A(y)), 
that there exists 

s G IJ Ty ((2r/^)2 V 21n-i(a(2ACo))"'A(y)) 

such that 

CE [if2(s,s)] < mf^ji^' (^,ry ((2r?;.)2 v21n-i(a(2«o))-'A(F))) +r?;2 ^ A^j ^ 
where C is an universal positive constant. By using inequality (18), 
C'E [H\s, s)] < mf^ |di (V^, y) + ,7(2 + ^1 . 
The conclusion follows. □ 
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6.1.2. Proof of Lemma 6.1. We start with the following concentration inequality. 
Lemma 6.3. Let fi, ■ ■ ■ , fn be n measurable bounded functions satisfying 



It. I f!m{t)d^,{t) 



< V. 

n 



The following inequality holds for all r >0: 



lpJJ^{t){dN,{t)-s,{t)d^i{t))>r^ < exp(-np/i(^)) 



^2 



where p = maxi<i<„ ||/i||oo o,nd where h is the function defined for u G (—1, +oo) by 
h(u) = (l + u) ln(l + u)-u. 

Proof. By homogeneity we can assume that p = 1. We assume moreover that for each 
i G {1, . . . , n}, /j is a piecewise constant function. There exist thus fci, . . . , A;„ G N* 
and a family (ajj) i<i<n of elements of [—1, 1] such that 

\<3<ki 

hi 

VtGT, fi{t) = Y,ai,jlAi,,{t) 

i=i 

where the Ai^j are measurable sets of T such that Ai^j n Ai^ji = for all j ^ j'. We 
have for alH > 0: 

n 

i=l 
n ki 

= ^ ^ InE 

1=1 j=i 

n ki 

= i^Mij)) (e*"^'^ - tai,j - 1) . 

i=i j=i 

By using the monotony of the function x i->- (e^ — x — l)/x'^, 

n ki 
i=l 3=1 

< v{e*-t-l). 

This result still holds when / is not piecewise constant by using the fact that a 
measurable function can be approximated by piecewise constant functions. The 
concentration inequality is then deduced from Cramer-Chernoff method, see Chap- 
ter 2 of Massart (2003). 
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□ 

Let us define the function ijj on with values into [—l/^/2, by 



where we use the convention 0/0 = and a/oo = for all a > 0. Let then 

^(si,'S2) = r(si,s2) -iE[r(si,s2)] 

= / V(si,S2)dM-E ( / V(si,S2)dM). 
We derive from Corollary 1 of Baraud (2010) that 

E [T{si,S2)] < (l + ^) H\s, si) - (l - ^) H\s, S2) 

and thus, 

P [r(si,S2) >aj] = P [Z(si,S2) >x-E[r(si,S2)]] 

< P Z{si,S2)>x-{l + ^H''{s,si)+[l-^H''{s,S2) 

Moreover, the random variable Z{si, S2) can be written as 

1 " 

Z{S1,S2) = -^(^i(si,S2) -E[Zi(si,S2)]) 
" i=l 

where for each i G {1, . . . n}, 

Zi{si,S2) = J ip {si{-,Xi),S2{-,Xi)) dNi-E (^j il^{si{-,Xi),S2{-,Xi)) diVj^ . 

We apply Lemma 6.3 with fi = ip {si{-, Xi), S2{-,Xi)), and v = H'^{s, si) + if^(s, S2) + 
H^{si, S2). The calculus of v is provided by the proof of Proposition 3 of Baraud 

(2010). Consequently, ii r = x - (l + H'^{s, si) + (l - ^) H'^{s, S2) is non- 
negative. 



2 

nr 



2u + ^ 



^ [T{si,S2)>x] < exp -- 

To avoid some complicated calculus, we bound by above the preceding probability 
without optimisation and we do not explicit the function a. Wc merely say that 

r > x + CH^{si,S2) for C = -(1 + l/V^)^-^ + ^0^- For n large enough, C>0 
and hence, 

/ nix^r.n^if^. ..n^^2 

[r(si,S2) >x\ < exp 



+ CH\si,S2)f 



6(1 + C)H^su S2) + # (x + CH^si,S2))^ 



3 

< exp [— an (x + Ci?^(si, S2))] 
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where a is some positive function of C. If x + CH'^{si, S2) > then r > and thus 
the preceding inequahty holds. If x + CH'^(si, S2) < this inequahty stiU holds since 
^ [T{si,S2)>x]<l. □ 

6.1.3. Proof of Proposition 2.1. We define V = U ^^^Y^ and A on V by 
A{V)= inf (A^(y) + A(^)). 

The assumptions of Theorem 2.1 are satisfied for (V, A) since 
Consequently, there exists an estimator s such that: 

A(^)\ 



5)] < mf^ I^M {dliVs, /) + e^if)} + ^ j 

with = infygv^ I^K/' ^) + ^y + ^^n^^ | ^'^d where C is an universal posi- 

tive constant. □ 

6.1.4. Proof of Theorem 2.3. For any function / on T x Af, we set / the function on 
T X X defined by f{t,x) = f{t,x{t)). Let for any F G V, F = {/, f e V} and let 
Y = {V,V e V}. We define A on V by A{V) = A{V). Theorem 2.1 applied with 
{V, A) leads to an estimator s such that, for all / G T x , 

CE [H\s, s)] < d\ (Vi, f) + mX [dl (/, V) + dimy + A(l^) | _ 



yev - V ' ji 



The proof follows from the fact that dim V < dim V and that d| (/, F) < (/, V). 

□ 



6.2. Proofs of Section 3. 

6.2.1. Proof of Corollary 3.1. This corollary can be easily deduced from Lemma 1 
and Proposition 1 of Baraud and Birge (2011). □ 

6.2.2. Proof of Corollary 3.2. For all j G {l,...,/ci} we consider the function 
Uj{t,x) = t, and for j G {ki + 1, . . . , fci + /}, the function Uj{t,x) =< 0,x >. 
The function / can be written as the composite function 

f = go (ui, . . .,Uki,Uki+i, . . .,Uki+i). 

Let W be a collection of finite dimensional linear spaces of bounded functions on 
[0, 1]''! X [-1, 1]' and A > 1 be a map on W such that J2wew e"^"^^) < 1. We apply 
Corollary 1 of Baraud and Birge (2011) with Tj = {{(t, x) i-> t}} for j G {1, . . . , ki}, 
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T^. = {{{t,x) ^< x,e >, e e M^2}} for j e {h + l,...,ki + l} and F = W. This 
leads to an estimator s such that, for aU W G W, 

r 9. .1 9 / ^ N lnnVln(||o||2fe-i) VI 
CE[H^{s,s)] < dl{^,f) + ' ^ A;2 

,2 , ,,,, dimVF + A(W) 

where C > depends on a, ki and /. We finally choose for W the collection of linear 
spaces given by Proposition 1 of Baraud and Birge (2011) and use their Lemma 1. 



6.2.3. Proof of Corollary 3.3. Let v e ([0, l]''^) satisfying to g -^7 ([q, if^), 
let g £7i ([0, ||f ||^]) and let / G ^ be the function f{t,x) = v{x)g{tv'^{x)). Let us 
consider the functions on T x X 

Vi{t,x) = v{x)/\\v\\oc, V2{t)=t, V3{t,x) =V^{t)/\\v\\l^, 

and let (/? be a function on [—1, 1]^ defined by 

Va, b,ce [-1, 1], (p{a, b, c) = \\v\\ooag (||v||^6c) . 

The function / can be thus written as the composite function ip o (vi,V2,vs). 

Furthermore, according to Definition 1 of Baraud and Birge (2011), we can choose 
for the modulus of continuity of (p the function w^ defined by 

Vx,y,zG [0,2], w^{x,y,z) = \\v\\^ (^\\g\\^x,\\vf^^''''^\\g\\^y^^^ 

Let then Vi and V2 be two collections of finite dimensional linear spaces of L'^(X, u^) 
to approximate v and v'^ respectively. Let us note g{x) = \\v\\oo9{\\v\\'^x) and let 
W be a collection of finite dimensional linear spaces of L°°([0, 1]) to approximate g. 
For each W eW, let W' = {{a, b, c) ^ ah {be) , ^ G T^} and let W' the collection of 
all the W' when W varies among W. 

We apply Theorem 2 of Baraud and Birge (2011) with Ti = Vi, T2 = {{(t, z) t}}, 
T3 = V2 and ¥ = W'. This leads to an estimator s such that, for all Vi G Vi, V2 £ V2 
and W eW, 

CE [H' {s, s)] < dl (V^, /) + \\gtodl{v. V.) + ^ In (Moo||g||oo) V 1 ^^.^^^ ^ 

lnnVlnf||^||JJ'("^')||5|U) VI 
+ ^2) + ^ ^ '- (dim 1^2 V 1) 

,2 , dim VI 

n 

We conclude by choosing for Vi, V2 and W collections of piecewise constant poly- 
nomial linear spaces as in the preceding proofs. □ 
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6.2.4. Proof of Lemma 3.1. The first part of the lemma is given by Proposition 4 
of Baraud and Birge (2011). For the second part, we define g on [0, 1/2] by g{x) = 
(1 - 2xY and V on [0, 1] by v{x) = ^yl - l/2x^. Then, v e ([0, 1]) and g e 
{% \Mlo])- If «' > tlie function f{t,0) = g{t) does not belong to H"'([0, 1]), 
whereas if a' f3' > a/3, the function /(l/2,x) = 2~°'^/l — l/2x^x°'^ does not belong 
to1«"'^'([0,l]). □ 

6.3. Proofs of Section 4. 

6.3.1. Proof of Lemma 4-1- The proof of this proposition requires the following 
elementary lemma. 

Lemma 6.4. Let /,/' G L2(T,/h) and g,g' G such that ||/||t = ||/'||t = 1 

and \\g\\-x. = Wg'W-x. = 1- Let k, k' G M. The following inequality holds: 

4 {i^fg, K'f'g') = (k - + kk' {dlif, f) + dlig, g') - 1/2 dl{f, f')dl{g, g')) . 

In this proof, we say that a set S{r]) is a ?7-net of an other set F in a metric space 
{E, d) if, for all y ^ V, there exists x e S{r]) such that d{x, y) < r]. 

Let us denote by Si (respectively the unit sphere of Vi (respectively V2). 
Let now for any r/ > 0, Si{r]) C Si (respectively S2{ri) C -S'2 ) be a 77- net of 
(respectively ^2) such that: 

V(/,5)gL2(T,/x)xL2(X,z.„), Vx>0, \Siirj)nBtif,xr])\ < {2x + 1)"'^''^ 

\S2ir])nB^ig,xv)\ < i2x + 1)"^^' 

where Bt{f,xr]) and Bx{g,xr]) are balls centered at / and g with radius xrj of the 
metric spaces (L^(T, /x), dt) and (L^(X, f„), dx) respectively. We refer to Lemma 4 
of Birge (2006) for the existence of these nets. Let now 

.(,)^U{>(/..)^^'.(^)x^=(^)}. 

First of all, S{r]) is a rj-net of V. Indeed, let (p eV. We can write ip{t, x) = Kf{t)g{x) 
with K G M+, / G S*! and g G 52- Let us define k = inf |i G N, ? > V^/ci/ry} and let 
if, 9') e Si{l/{V2k)) X S2il/{V2k)) such that 

cit(/,/')<i and d.ig,g')< ^ 



V2k '^'^'-\/2fe' 
Then, by Lemma 6.4, the application ip'{t, x) 

= ^f{t)g'ix) is such that d2 {(f, (f') < 

V- 

Furthermore, let be a function of S{r]) written as ip{t, x) = Kf{t)g{x). We want 
to bound the cardinality of the set S{rj) n B {nfg, xrj) (where B {nfg, xrj) is the ball 
centered at Kfg with radius xrj of the metric space (L^(T x X, M),d2)). Let then 
Lp' = n' f g' G S{r]) n B {nfg^xrj). We derive from Lemma 6.4 that: 

{K-K!f <dl{(p,ip') <x^rf. 
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Consequently, k' has to be chosen among at most 2\/2x + 1 numbers. Afterwards, 
assume that 

' " f{t)f{t)dtx{t) \ ( [ g{x)g' ix)dMx)] < 0. 



IT 

In this situation, S^^ip^ip') > + k'^. Since, k > and k! < k, + xt], for any 

x>2, ^ <1 + V2x < §x. Thus, dl{(p, (p')>^, and since 

||/-/'||2<4 and \\g-g'\\l<A, 

we have. 

This permits us to control the number of /' and g' possible. If now, 

[ /(t)/'(t)d^(t) > and / g{x)g'{x)dMx) > 0, 

we derive from Lemma 6.4 and from the elementary inequality x + y — l/2xy > 
l/2{x + y) for x,y G [0,2], that: 

(/^ - + ^ {dlif, f) + dl{g, g')) < 4 i'P, f') < xW, 
which leads to 

dUfJ') + dlig,g')<^-^<'^. 

Thus, 
Finally, if 

/ f{t)f'{t)dfi{t) < and / g{x)g'{x)diyn{x) < 0, 
it comes from the preceding calculus that 

Consequently, we have proved 

G S{v), Vx > 2, \S{v)nB{^,xr])\ < (2V2X + l) (l2a;2 + i)dimyi+dimy2 _ 

To prove the result, this inequality must hold for all ip G L^(T x X, M). If 99 G 
L2(TxX,M), maybe \S{r]) n B {(p,xr])\ = 0. If not, there exists G S{r))r\B {(p, xr]) 
and thus, 

\S{r^)nB{ip,xrj)\<\S{v)nB{p',2xv)\. 

Consequently, 

V(^gL2(TxX,M), Vx> 2, \S{r])nB{ip,xrj)\ < (4V2x + 1^ (48x2 + 1)^^°'^^+^'°'^' 
The conclusion ensues from the elementary inequality 

Vx>2, 48x2 + 1 < e^-^^'. 
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□ 



6.3.2. Proof of Proposition 4-1- For any pair (Vi,V2) G Vi x V2, we define the set 
V by relation (10). Let then V be the collection of all V when (Vi, V2) varies among 
Vi X V2 . Let A be the application on V defined by 

A(y) = Ai(Fi) + A2(y2) 

when V corresponds to (Vi, V2)- We apply afterwards Theorem 2.1 with V and A 
to derive an estimator s such that 

^w.rrr2/ ^ ■ r f ,2 / /- tA dim + dim F2 + Ai (Fi) + A2 (^2) 
CE [H\s, s)] < mf <^ 4 (V^, V) + ^ 



Let thus KV1V2 G and let (^'1,^2) & Vi x V2 such that H^'iHt = \\v2\\x = 1- The 
preceding inequality implies 

, r 2/ '^M/j2/^ \, 2,2/ , ,s dimFi+diml^2 + Ai(14) + A2(F2) 
CE[H {s, s)\ < d2 (Vs, KV1V2) + K d2 (■"1^2, "^i '"2) "I 

Some calculus shows that 

4 {V1V2,V[V'2) < 2 {4 {vi,v[)+4 {V2,V'2)) . 

By taking the infimum over all v[ and V2, 

C"E[H\S,S)] < 4 {V~S, KVIV2) + K^4 (vi: S,) + ^ ^'^^'^ + K^4 {V2, S2) 

dimF2 + A2(F2) 



+ 



n 



where and ^2 are the unit spheres of Vi and V2 respectively. We conclude by 

using the fact that dt{vi,Si) < 2dt{vi,Vi), d2{v2,S2) < 2dx('y2jV2) and by taking 
the infimum over all (Vi, F2) G Vi x V2. □ 

6.3.3. Proof of Lemma We derive from some calculus that for all — 1/2 < r < 
6, 6' < R, 



I i^fW^i' - V26' + U*Y dt = — 

' (1 



4 (6 - 6') 



,/\2 

2 



(1 + 6 + 6') (^/26TT + v/26^) 
and thus 



< C [^fW^i' - V26' + lt''Y dt < 



(l + 2i?)2 - i^V " ' ; - (l + 2r)2- 

This concludes the first part of the proof. For the second part, for 6 > 0, we define 

AW _2W^ / 
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Then for < r < 6, 6' < R, 

f I — \ ^=+1 
^~ (6 + 6')fe+i 
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{9h{t) - 9h'{t)f di 



+ + V^)2 

1 

Consequently the lemma ensues from the inequality below. 
1 

8i?2 



6 + 6' 



(6-6')^ 



6.3.4. Proof of Proposition 4-3. We generalize Lemma 4.1 for some new spaces. The 
proof of the following lemma is analogue to the one of Lemma 4. 1 and will not be 
detailed. 

Lemma 6.5. Let Vi and Vi be some subsets of the unit sphere o/L^(T,/x) and 
L?{X,i'n) respectively. For each i G {1,2}, we assume that there exist Wt a subset 
of a D\Y^- dimensional normed linear space {W±, \ ■ \±), and a map $i from W± onto 
Vi such that: 

(19) y{x,y)eWu pi\x-y\i <dt{^i{x),^i{y)) < pi\x-y\i. 

(20) y{x,y)eW2, P2\x-y\2<dy.{^2{x),^2{y))<p2\x-y\2- 
The set 

V = {kviV2, {vi,V2) e Vi X V2, K > 0} 
has a finite metric dimension bounded by 



Dv = C 



Pi 
Pi 



+ D,;^.^ In ( 1 + 



P2 
P2 



where C is an universal constant. 

Lemma 6.6. Let for all bo < r < R, Vt{r, R) be the set defined by 

Ub 



Vt{r,R) = 



-, r<b<R}. 



Condition (19) holds with D-^^ = 1, pi = p{R) and pi = p{r). 
Lemma 6.7. For any integer p > and W G W, let 

Vg 



V2{W,R) 



\ve\ 



eW, 11^112 



There exists {W2, | • I2) a finite dimensional linear space and $2 0, map from W2 
onto V2{W,R) such that condition (20) holds with Dy^^ < dimVF, p2 = e~^P and 
P2 = e^P. ~ 
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Proof of Lemma 6.7. For any integers i, j G N*, let us denote by ipi^j the linear form 
on defined by (pi,j{9) =< Xi — Xj,6 >. Let VFi = Cii^jKeKpij and let W2 such 
that W = Wi®W2 and such that <u,v >= for all {u,v) G VFi x W2. Since the 
functions of L^(X, f„) are defined f„-almost everywhere, the set V2{W,R) can be 
written as 

V2{W, R) = $2 {{0 e ll^ll < p}) where $2 W = 

\\ve\\x 

Let, for all x eX, '^^ be the function defined from X into R by ^x{d) = ^2{0){x). 
We derive from some calculus that the differential of at the point G W2 , denoted 
by d^'a;(0), satisfies 

yheR\ d^xjO) ■h= '^ <0,x.>^<0,x » (< X - X., » _ 

(^EILiexp(2<0,x, 

In particular, we have 

V/iGM'=2^ V| <x-a;i,^> I < \d^x(d) ■ h\ < — V I < x - x,, /i > 1. 

n n ^-^ 

i=l 1=1 

If we endow W2 with the norm | • I2 defined by 



ye G W2, \e\2 



\ 



1 " / 1 " 

r) 



»=1 \ J=l 



the mean value theorem leads to 

V(0i, ^2) e W2, e-3^|0i - 02I2 < 4 ($2(^1), $2(^2)) < e^^^'iei - 02I2, 
which concludes the proof. □ 

Thanks to Lemma 6.5, for all /j > 0, c < r < and G W the set 
V{r, R, W, q) = {aubve, a>Q,r<h<R,eeW, \\e\\ < g} 
has a metric dimension bounded by 

CDv^r,R,W,g) = I + QdimW + In (^1 + 

for some universal positive constant C. Let us define the collection V by 

Y = {V {bo + 1/r, bo + R,W,g),We W, r,R,ge N*} 

and the map A on V by 

A {V{r, R, W, g)) = A{W) + \n{2R^) + ln(2r2) + ln(2^2). 

Theorem 2.1 applied with (V, A) provides an estimator s, such that, for all W G W, 
all g,r,R G N*, all 0' e W such that ||^'|| < g, all a > 0, all 6 satisfying to 
bo + l/r <b<bQ + R, 

1 + g dim I^ + In f 1 + ^^^P^^m^ ] + Inr + IniZ + Ing 
C'K [H\s, s)] < 4{V~s, aum') + ^ '^^^^^^ 
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where C is another universal positive constant. In particular, with R = inf{z G 
W,i>b - 60}, r = inf{i G N^^ > 1/(6 - 60)}, Q = inf{i e N*,i > \\9'\\} this 
inequality leads to 

(1 V VdimPF) 



C"E[H'^{s,s)] < dl{y/s,aubV0> 



n 



+ -<!ln 

n 



1 V p 60 + 



6-60 + 1 



+ |ln(lAp(l + 6))| + |ln(6-6o)| 



where C" is another universal positive constant. For 6 G M triangular in- 

equality leads to 

dl{y/s, aUbV0i) < 2 [dl{\^, au^vg) + d\{aubVg, au^vgi) 

< 2 {dl{^/s,aubvg) + a'^\\ub\\ldl{vg,vgi)) . 

Some calculus shows that dy^{ve,vg') < ell^H^H^'H ||^ — 6'\\. We conclude by choosing 
9' the projection of 9 on W, and by taking the infimum over all G W. □ 

6.4. Proofs of Section 5. 

6.4.1. Proofs of Lemmas 5.1 and 5.2. 

Proof of Lemma 5.1. We derive from some calculus that for 02, ^2 ^ [~ 1/2+1/^2, +00 [ 
Thus, for (^1,^2), (01,^2) ^ -^n,i2, by the triangular inequality 



~ ^20^ + 1 ^(l + 202)(l+02 + 02)(l + 202) 

□ 

Proof of Lemm,a 5.2. We derive from some calculus that for ^2,^2 ^ 1/^2, and g{x) = 

(^Vtfe-ie-e^* - Vffc^V^ ' dt = (/c - 1)! (^5(^2) + 5(^2) - 25 (^4^) ) • 
Consequently, by the mean value theorem, 

/ fVt'^-ie-^^* - \Jt^-^e-^'A dt<(A;-l)! sup \^ {c)\\ \92 - 9'^ . 

The conclusion follows from the triangular inequality as in the proof of the preceding 
lemma. □ 
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6.4.2. Proof of Theorem 5.1. We start with the fohowing proposition. 

Proposition 6.1. Assume that Assumption 5.1 holds and let K ^ K,. Let for any 

^ ^ j ^ k, Wj be a linear subspace o/L^(X, z/^) with finite dimension and Zj 
be a bounded subset of Wj. Let then p G such that for all j G {1, . . . , A;}, 

Zj C Bx(0, /9j) = {g & L^(X, Il^llx < Pj}. Let ttk be the projection on K in 
for the distance dr defined by 



V0, ^' G M^ dr{e, e') = Rj{K) \ej - & 



The set 

V 



{t,x) ^ f-KK{u{x)) 

{t),ue\{Z, 



has a metric dimension bounded by 



Dviv) = ^ V 1 X: In (l + 2 (^^) Pi) dim(T^,)- 

Remark: ii' is a closed set for the usual topology of M*^. It is straightforward to 
verify that K is still a closed set for the metric space (M'^,dr). This shows that the 
projection on the closed convex set K always exists. 



Proof of Proposition 6.1. As in the proof of Lemma 4.1, we say that a set S{vi) is a 
77-net of an other set F in a metric space {E, d) if, for all y G F, there exists x G S{ri) 
such that d{x, y) < rj. For the sake of simplicity, we shall denote all along this proof, 

Rj = Rj{K), aj = aj{K), and n = ttk- 

Let for all J G {1, . . . , A:}, rjj > 0, and Zj{rij) be a maximal subset of Zj such that 
dx{x, y) > rjj for all x / ?/ G Zj{r]). This is a r/j-net of Zj such that, by Lemma 4 of 
Birge (2006), 



\Z'^{rij)\<\Z'^{Vj)nB.{0,pj)\<l^^ + l 



Remark that the set 



Sirj) 



{t,x) ^ fn{u{x))it), U^YlZ'j 




is a ?7-nct of V. Indeed, let / G F be the function of the form f(t, x) = f.n-(u{x))i'l^) ^^'i 
for any 1 < j < k, let Vj G Z'A{r]/ {kRj))^'"^') such that \\uj - VjW^ < {r)/{kRj)f''^K 
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We define v = {vi, . . . ,Vk) and g & S{r]) by g{t, x) = U(v{x)){t)- Then, 

1 " 2 
II/-5II2 = -^\\f7r{u{xi)){-) - fn{v(xi)){-)\\^ 
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i=l 

n 



1=1 
^71 k 



\20LA 



< V ■ 

Moreover for all x > 2 and ipelJ{T xX, M), 



\S{ri)nB{^,xr,)\ < J] 

k 



kR-i 



s n 2 



kR, 



1/a, 



A\ra{Wj) 



< expfi^dimTOln(^2(^y^"%, + ljx 



which leads to the result. 



□ 



Lemma 6.8. Assume that there exist k G N*, a,b E {^+)^ with maxi<j<;t'^i ^ 1; 
and mini<j<fc 6^ > 1 such that 



V 



Vr?>0, L>v(?7)<^ailn 1 + 



i=l 



Then, there exists an universal positive constant C such that 



+ 



In 1 + 



n 



n 



n 



Efc 



Proof of Lemma 6.8. If one increases Dy, r/y will increase. Consequently, without 
lost of generality we can assume that: 



Dv{7i) 



'2 E-=i di ln(26i) + 2 (Eti a,) In (i) if r/ < 1 



.2Ei=i«^ln(2bi) 



otherwise. 



Remark that for all a, /3, y > 0, the equation 

a + /3 In X = 



2x2 
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has only one solution x given by 



X 



y 



/3W 



2a \ 1 



-y 



where W is the Lambert function, defined as the inverse function of t fe*. Con- 
sequently, by setting 

k k 

a = ln(25i) and /3 = 



i=l 



1=1 



we derive that the positive number ry defined by 



2a 

I3W{ 



2a 
V. n 



if n > 2a 
if n < 2a 



satisfies to Dyijf) = nrj^. In particular, -qv < f]- The conclusion ensues from some 
elementary inequalities. 

□ 

Proof of Theorem 5.1. Let us note for all Wj G Wj, and all integer pj G N*, the 
set Zj{Wj,pj) = Wj n ^x(0, Pj). Let then, for all W = {Wi, . . . , Wfe) G nj=i , 
p = {pi, ...,pk)e (N*)^, and K G /C the set 

VKiW,p)-- 

We define 



{t,x) ^ f^j,(u(x)){t), u G 



Fx {W, p) , G JJ Wj, p G (N'^)'' , G /C 
and we define A on UygyF by 

k 

A (Fx {W, p)) = {^AWj) + In {2p^)) + Ak;(K). 

We apply Theorem 2.1 with (V, A) to derive an estimator s such that, for all W 
{W^,...,Wk)e n •=! W,- , p = (pi , . . . , pfc) G (N*)^ and K G /C, 

A(yx(W,p)) 



+ 



n 



where C is an universal positive constant. It comes from Proposition 6.1 and 
Lemma 6.8 that 

<- 1 E (^^) (in (1 . ^w^n - 
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where C is an universal positive constant. In particular for all f ^ ^ of the form 
f{t,x) = fu{x){^)i fo'^ 8-11 ^ such that ti(X) C for all map v = {vi, . . . ,Vk) G 
Yl'j=i that for all 1 < j < fc, H^jHx < ll'WjIlx, and g of the form g{t,x) = 

f'KK{v{x))if)i the preceding inequality applied with pj = mi{i G N*, i > \\uj\\-x} leads 
to 

C"E[H\s,s)] < dl{^j)+dl{f,g) 

H ^ 

n 

k 

i . 

+ - 



7, 



It ensues from Assumption 5.1 that 

k 

dl{f,g)<kY,RjiKfhj- 

i=i 

We conclude by choosing vj the projection of Uj on Wj in the space L^(X, □ 

Acknowledgement: Many thanks to Yannick Baraud for his suggestions, comments 
and careful reading of the paper. 
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